Data Preprocessing: A Milestone of Web Usage Mining
نویسندگان
چکیده
-.Internet is today full of structured or unstructured information. and this information is directly or indirectly influencing society or peoples. Because today internet is part our daily life activity. But using this abundant and ambiguous in most efficient manner in useful decision making is still a big challenge. During our web surfing either it is online shopping or blogging or using tweets and chatting everything is recorded. Web servers and log files are used to collect all the activity information of a user accesses to a web server. Log data is usually noisy and ambiguous. Web usage mining, also known as web log mining, is the application of data mining techniques on large web log databases to discover knowledge about user behavior pattern and web site usage statistics that can be used for various website design tasks. WUM consists of three phases:-Preprocessing, Pattern discovery and Pattern analysis. It is a fact that normal log files data is very huge, noisy, unclear and confusing with lots of redundancy. It is very important to preprocess the log data for efficient web usage mining process. Preprocessing results also influences the later phases of web usage mining. This makes the preprocessing of server log files a significant step in web usage mining. This study includes analysis, comparison and contrast of the available preprocessing techniques. How we can be more focused and guided at preprocessing level. So in this paper, we have given a complete preprocessing analysis by reviewing the existing work done in the preprocessing stage.
منابع مشابه
A Survey on Preprocessing Methods for Web Usage Data
World Wide Web is a huge repository of web pages and links. It provides abundance of information for the Internet users. The growth of web is tremendous as approximately one million pages are added daily. Users’ accesses are recorded in web logs. Because of the tremendous usage of web, the web log files are growing at a faster rate and the size is becoming huge. Web data mining is the applicati...
متن کاملAn Algorithmic Approach to Data Preprocessing in Web Usage Mining
Web usage Mining is an area of web mining which deals with the extraction of interesting knowledge from logging information produced by web server. Different data mining techniques can be applied on web usage data to extract user access patterns and this knowledge can be used in variety of applications such as system improvement, web site modification, business intelligence etc. Web usage minin...
متن کاملSessionization –A Vital Stage in Data Preprocessing of Web Usage Mining-A Survey
The World Wide Web has impacted on almost ever aspects of our lives in modern era. The Web has many unique characteristics and which make mining useful information and knowledge a challenging task. Web mining uses many data mining techniques but it is not an application of traditional data mining due to heterogeneity and unstructured nature of the data on Web. Web mining tasks can be categorize...
متن کاملAn Efficient Algorithm for Data Cleaning of Log File using File Extensions
World Wide Web is a monolithic repository of web pages that provides the Internet users with heaps of information. With the growth in number and complexity of Websites, the size of web has become massively large. Web Usage Mining is a division of web mining that involves application of mining techniques to web server logs in order to extract the behavior of users. A Web Usage Mining process com...
متن کاملWeb Usage Mining Tools & Techniques: A Survey
--The Quest for knowledge has led to new discoveries and invention. That leads to amelioration of various technologies. As years passed World Wide Web became overloaded with information and it became hard to retrieve data according to the need .Web mining came as a violence to provide solution of above problem. Web usage mining is category of web mining. Web usage mining mainly circulation with...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015